Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

نویسندگان

Md. Jahangir Alam

Patrick Kenny

Douglas D. O'Shaughnessy

چکیده

In this paper, we present robust feature extractors that incorporate a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, to estimate the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high variance and they perform poorly under noisy and adverse conditions. To reduce this performance drop we propose to increase the robustness of speech recognition systems by extracting features that are more robust based on the regularized MVDR technique. The RMVDR spectrum estimator has low spectral variance and is robust to mismatch conditions. Based on the RMVDR spectrum estimator, robust acoustic front-ends, namely, are regularized MVDR-based cepstral coefficients (RMCC), robust RMVDR cepstral coefficients (RRMCC) and normalized RMVDR cepstral coefficients (NRMCC). In addition to the RMVDR spectrum estimator, RRMCC and NRMCC also utilize auditory domain spectrum enhancement methods, auditory spectrum enhancement (ASE) and medium duration power bias subtraction (MDPBS) techniques, respectively, to improve the robustness of the feature extraction method. Experimental speech recognition results are conducted on the AURORA-4 large vocabulary continuous speech recognition corpus and performances are compared with the Mel frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP), MVDR spectrum estimator-based MFCC, perceptual MVDR (PMVDR), cochlear filterbank cepstral coefficients (CFCC), power normalized cepstral coefficients (PNCC), ETSI advancement front-end (ETSI-AFE), and the robust feature extractor (RFE) of [6]. Experimental results demonstrate that the proposed robust feature extractors outperformed the other robust front-ends in terms of percentage word error rate on the AURORA-4 large vocabulary continuous speech recognition (LVCSR) task under clean and multi-condition training conditions. In clean training conditions, on average, the RRMCC and NRMCC provide significant reductions in word error rate over the rest of the front-ends. In multi-condition training, the RMCC, RRMCC, and NRMCC perform slightly better in terms of the average word error rate than the rest of the front-ends used in this work.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regularized MVDR spectrum estimation-based robust feature extractors for speech recognition

In this paper, we present two robust feature extractors that use a regularized minimum variance distortionless response (RMVDR) spectrum estimator instead of the discrete Fourier transform-based direct spectrum estimator, used in many front-ends including the conventional MFCC, for estimating the speech power spectrum. Direct spectrum estimators, e.g., single tapered periodogram, have high vari...

متن کامل

Perceptual MVDR-based cepstral coefficients (PMCCs) for robust speech recognition

This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the Minimum Variance Distortionless Response (MVDR) method of spectrum estimation. We incorporate perceptual information directly in to the spectrum estimation. This provides improved robustness and computational efficiency when compared with the previously proposed MVDR-MFC...

متن کامل

New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after fi...

متن کامل

MVDR based feature extraction for robust speech recognition

This paper describes a robust feature extraction method for continuous speech recognition. Central to the method is the Minimum Variance Distortionless Response (MVDR) method of spectrum estimation and a feature trajectory smoothing technique for reducing the variance in the feature vectors. The above method, when evaluated on continuous speech recognition tasks in a stationary and moving car, ...

متن کامل

Warping and Scaling of the Minimum Variance Distortionless Response

Spectral estimation based on the minimum variance distortionless response (MVDR) is well-known in the signal processing literature and has been shown to be superior to linear prediction for robust speech recognition. In this work we propose two techniques to improve the resolution and the robustness of the MVDR spectral estimate: The first is a time-domain technique to estimate an all-pole mode...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Speech Communication

دوره 73 شماره

صفحات -

تاریخ انتشار 2015

Regularized minimum variance distortionless response-based cepstral features for robust continuous speech recognition

نویسندگان

چکیده

منابع مشابه

Regularized MVDR spectrum estimation-based robust feature extractors for speech recognition

Perceptual MVDR-based cepstral coefficients (PMCCs) for robust speech recognition

New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition

MVDR based feature extraction for robust speech recognition

Warping and Scaling of the Minimum Variance Distortionless Response

عنوان ژورنال:

اشتراک گذاری